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1, INTRODUCTION 

Breast cancer is the most common cancer in women worldwide, with nearly 1.7 million new 
cases diagnosed in 2012 [1]. Abnormal tissue screening using X-ray mammography is currently the 
most effective method of early detection of the disease [2-3]. The introduction of digital mammography 
gave the opportunity of increasing the number of commercial Computer Aided Detection (CAD) 
systems, which has significantly enhanced the radiologists’ ability to detect and diagnose cancer and 
take immediate precautions for its earliest prevention [4]. One problem with CAD systems is due to a 
large number of false positive (FP) marks when high sensitivity is required [5]. Too many false 
positives may confuse the radiologist of the most common types of cancer among women all over the 
world is breast cancer. Great effort has been devoted in recent years to the development of CAD which 
propose a lot of features to reduce false positives [6]. However, many features are not key features of 
masses and they make high dimensions for classification. 

In this paper, we introduce novel method using moment and basic characteristic of the masses. 
Block Difference Inverse Probability (BDIP) and basic features are calculated in different multi- 
resolutions. Once the features are extracted, random projection [7] and k nearest neighbor (k NN) [8] 
with distance weighting are used to classify the suspicious areas into real mass or normal parenchyma. 
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2. PROPOSED METHOD 
2.1. Database 

In this study, we use mammogram database Mini- MIAS [9] to test the method presented. MIAS is 
the public database of Mammographic Image Analysis Society - an organization of United Kingdom research 
groups. This database includes 322 mammograms from 161 patients. Films taken from the United Kingdom 
National Breast Screening Program have been digitized to 50-micron pixel edge and presented each pixel 
with an 8-bit word. Every image in database always has extra information or ground truth as shown in 
Figure | from the radiologists about characteristic of background tissue, type of abnormality present, severity 
of abnormality, the coordinates of center and approximate radius (in pixels) of a circle enclosing the 
abnormality. Mini-MIAS database is a reduced type of the original MIAS database (digitized at 50-micron 
pixel edge) has been reduced to 200-micron pixel edge and clipped/padded so every image has size of 
1024 x 1024 pixels. 





Figure 1. Red line shows ground truth in MINI-MIAS database 


2.2. Preprocessing 

The aim of the step is to remove unnecessary information in mammograms such as label, pectoral 
muscle or other noise. To separate the breast region from image label, we just threshold the image and keep 
the biggest threshold region. The pectoral muscle in a mammographic image appears as a predominant 
density region. It can affect negatively the result of detection method [10]. For this reason, the region 
representing the pectoral muscle should be eliminated. In the mammogram, there are also some small bright 
spots which have gray level approximate that of circumscribed mass. Median filtering with a window of 3x3 
is applied for eliminating these spots as illustrated in Figure 2. 





Figure 2. Original (left) and preprocessed (right) mammograms 


2.3. Mass detection 

In this stage, suspicious regions are extracted from the preprocessed mammogram. The radiologists 
should focus their attention to these extracted regions. The steps of this procedure are fully described in [11]. 
Shown in Detected ROIs are masked are masked as true positive ROIs (TP-ROIs) or false positive ROIs (FP- 
ROIs) as illustrated in Figure 3 based on the provided ground truth. 
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Figure 3. Detected ROIs (green) and ground truth (red) 


2.4. Feature extraction 

In human vision, edges and valleys [12] in an image are very important features, especially valleys 
are fundamental in the vision perception of an object shape [13-14]. Block Difference Inverse Probability 
(BDIP) is the texture feature which measures the variation in intensities of an image block. It effectively 
extracts edges and valleys. The larger the variations of intensity or the size of the block, the higher the value 
of BDIP [15]. BDIP of a block of size WxW is defined as: 


; 2 e . 
a > max Ii, j)-1G, | 
BDIP = W* G, peBL@ DEB 
max I(i, J) 
(i, j)EB 


where I[(i,j) denotes the intensity of a pixel (i,j) in the block B. 

As the detected ROI is not in size of WxW so we subtitute the term “W2”in above equation by size 
or number of pixels in the ROI to calculate the BDIP feature at first resolution, which then is just simply 
called BDIP. Other BDIP features at different resolution are calculated as follow: 

a. Divide each side of the minimal rectangular that contains the ROI by 2, 3...n to get 4, 9... n* blocks. 

b. For each block using above equation to calculate BDIP features which are called BDIP2x2 and 
BDIP3x3... BDIPnxn. 

c. Expectation and variation of BDIPs are used as BDIP features for each Rol. They are 
BDIP2x2mean, BDIP2x2var, BDIP3x3mean, BDIP3x3var,...BDIPnxnmean, BDIPnxnvar 
respectively. 

On the other hand, we compute basics features of each ROI: 

a. Mean: the average grey level 

b. Var: the standard deviation of grey level 

c. Max: the highest grey level 

d. Min: the lowest grey level 
However high or low intensity values is not absolute, input images often have different brightness. We 
propose two extra features for ensuring the persuasive of our algorithm 

a. Ratio_1: Mean/Max 

b. Ratio_2: Max/Max_I 
where Max_I is the highest gray level of the whole image. 

Multi-resolution basic features are calculated in the same manner as multi-resolution BDIP feature. 


2.5. Random Projection 

In mathematics and statistics, random projection is a technique used to reduce the dimensionality of 
a set of points which lie in Euclidean space. Random projection methods are powerful methods known for 
their simplicity and less erroneous output compared with other methods. According to experimental results, 
random projection preserve distances well, but empirical results are sparse [15]. In random projection, the 
original D-dimensional data is projected to a L- dimensional (L << D). 
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Xv = Rip X 


LxD~*~ DxN 


where Xixp, Xpxn denote output and input matrix and Ryxpis arandom projection matrix. 
The random matrix R can be generated using a Gaussian distribution. Achlioptas [15] has shown 
that the Gaussian distribution can be replaced by a much simpler distribution such as: 


+1 with probability 1/6 
R, , =V340_ with probability 2/3 
~] with probability 1/6 


2.6. K Nearest Neighbor 
Let T = {(Xi, yi): 1=1:N} denote the training set where x; is the training vector in m-dimensional 
feature space and y; is the corresponding class label. Given unknow x’, class y’ is assigned by two steps 
a. First, a set of k labelled target neighbours for the x’ is identified and sorted in ascending order 
in term of Euclidean distance to x’, 
b. Second, the class label y’ is predicted by major voting of it nearest neighbours. 

A weighted voting scheme for KNN, which is called distance-weighted k nearest neighbor (WkNN) 
rule is proposed in [16]. In wkNN, the closer neighbors are weighted more heavily than the farther ones, 
using the distance-weighted function. Then the classification result of the query is made by the majority 
weighted voting a neighbor with smaller distance is weighted more heavily than one with greater distance: 
the nearest neighbor gets weight of 1, the furthest neighbor a weight of 0 and the other weights are scaled 
linearly to the interval in between. 


3. RESULTS 

The number of detected ROI is 1000 [11]. For each ROI, BDIP and basic features are calculated at n 
level. The maximal value of n is the minimal radius of a circle enclosing the abnormality provided in the 
Mini-MIAS database. Totally we have 2400 features. Different values of K are tested and value of K which 
gives highest sensitivity is selected. Figure 4 shows the performance with different K value. The selected 
value of K is 21 with sensitivity of 90 %. 


Sentivity (4) 


K Value 


Figure 4. Original (left) and preprocessed (right) mammograms 


Table 1 gives comparisons of our method to different approaches. It is obvious that our method 
provides higher sensitivity at lower number of false positives per image. On the other hand, we also 
compare the performance in terms of sensitivity, false positive per image, time of random projection and 
time of running between different sizes of random projection matrix. The results are given in Table 2. 
The result shows random projection help to reduce time of running. This tool should be effective with 
big data and a lot of features but in small data it can influence to other performance. 

Table 1. Comparison to other approaches 


Approach Sensitivity (%) False Positives per Image 


Density slicing, texture flow field 81 Ps 
analysis 
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Multi-level threshold segmentation 80 23 
K mean clustering 85 1 
Multi-resolution features, distance 90 1.04 


weighted k nearest neighbor 


Table 2. Performance with different size of random projection matrix 


Size of matrix Sensitivity False positive per Time of random Running time per 
image projection per image image (s) 
(s) 

2000x2400 89 1.1 2.1 19 
1500x2400 87 1:2 1.9 17 
1000x2400 85 1.4 1.6 16 

Full 90 1.04 24 
CONCLUSIONS 


This study proposes a new method to detect masses in mammographic image based on 


combination of multi-resolution features and distance weighted K nearest neighbor algorithm. The 
highest sensitivity is observed with small false positive per image. Comparisons with other related 
works prove that our method is effective and has potential to be further investigated. When using 
random projection, this tool will be effective with big data. In the future, we will evaluate the 
method on larger set of mammograms and use different features. 
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